Fake Review Detection: Classification and Analysis of Real and Pseudo Reviews

نویسندگان

  • Arjun Mukherjee
  • Vivek Venkataraman
  • Bing Liu
  • Natalie Glance
چکیده

In recent years, fake review detection has attracted significant attention from both businesses and the research community. For reviews to reflect genuine user experiences and opinions, detecting fake reviews is an important problem. Supervised learning has been one of the main approaches for solving the problem. However, obtaining labeled fake reviews for training is difficult because it is very hard if not impossible to reliably label fake reviews manually. Existing research has used several types of pseudo fake reviews for training. Perhaps, the most interesting type is the pseudo fake reviews generated using the Amazon Mechanical Turk (AMT) crowdsourcing tool. Using AMT crafted fake reviews, [36] reported an accuracy of 89.6% using only word n-gram features. This high accuracy is quite surprising and very encouraging. However, although fake, the AMT generated reviews are not real fake reviews on a commercial website. The Turkers (AMT authors) are not likely to have the same psychological state of mind while writing such reviews as that of the authors of real fake reviews who have real businesses to promote or to demote. Our experiments attest this hypothesis. Next, it is naturally interesting to compare fake review detection accuracies on pseudo AMT data and real-life data to see whether different states of mind can result in different writings and consequently different classification accuracies. For real review data, we use filtered (fake) and unfiltered (non-fake) reviews from Yelp.com (which are closest to ground truth labels) to perform a comprehensive set of classification experiments also employing only n-gram features. We find that fake review detection on Yelp’s real-life data only gives 67.8% accuracy, but this accuracy still indicates that n-gram features are indeed useful. We then propose a novel and principled method to discover the precise difference between the two types of review data using the information theoretic measure KL-divergence and its asymmetric property. This reveals some very interesting psycholinguistic phenomena about forced and natural fake reviewers. To improve classification on the real Yelp review data, we propose an additional set of behavioral features about reviewers and their reviews for learning, which dramatically improves the classification result on real-life opinion spam data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detection of Fake Accounts in Social Networks Based on One Class Classification

Detection of fake accounts on social networks is a challenging process. The previous methods in identification of fake accounts have not considered the strength of the users’ communications, hence reducing their efficiency. In this work, we are going to present a detection method based on the users’ similarities considering the network communications of the users. In the first step, similarity ...

متن کامل

What Yelp Fake Review Filter Might Be Doing?

Online reviews have become a valuable resource for decision making. However, its usefulness brings forth a curse ‒ deceptive opinion spam. In recent years, fake review detection has attracted significant attention. However, most review sites still do not publicly filter fake reviews. Yelp is an exception which has been filtering reviews over the past few years. However, Yelp’s algorithm is trad...

متن کامل

Spotting Fake Reviews using Positive-Unlabeled Learning

Fake review detection has been studied by researchers for several years. However, so far all reported studies are based on English reviews. This paper reports a study of detecting fake reviews in Chinese. Our review dataset is from the Chinese review hosting site Dianping, which has built a fake review detection system. They are confident that their algorithm has a very high precision, but they...

متن کامل

Sentiment Analysis Based Online Restaurants Fake Reviews Hype Detection

In our daily life, fake reviews to restaurants on e-commerce website have some great affects to the choice of consumers. By categorizing the set of fake reviews, we have found that fake reviews from hype make up the largest part, and this type of review always mislead consumers. This article analyzed all the characteristics of fake reviews of hype and find that the text of the review always tel...

متن کامل

Analyzing and Detecting Opinion Spam on a Large-scale Dataset via Temporal and Spatial Patterns

Although opinion spam (or fake review) detection has attracted significant research attention in recent years, the problem is far from solved. One key reason is that there is no large-scale ground truth labeled dataset available for model building. Some review hosting sites such as Yelp.com and Dianping.com have built fake review filtering systems to ensure the quality of their reviews, but the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013